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Abstract 

As part of an effort to develop a systematic methodology for earthquake forecasting, we use a 
simple model of seismicity based on interacting events which may trigger a cascade of 
earthquakes, known as the Epidemic- Type Aftershock Sequence model (ETAS). The ETAS 
model is constructed on a bare (unrenormalized) Omori law, the Gutenberg-Richter law and the 
idea that large events trigger more numerous aftershocks. For simplicity, we do not use the 
information on the spatial location of earthquakes and work only in the time domain. We 
demonstrate the essential role played by the cascade of triggered seismicity in controlling the 
rate of aftershock decay as well as the overall level of seismicity in the presence of a constant 
external seismicity source. We offer an analytical approach to account for the yet unobserved 
triggered seismicity adapted to the problem of forecasting future seismic rates at varying 
horizons from the present. Tests presented on synthetic catalogs validate strongly the 
importance of taking into account all the cascades of still unobserved triggered events in order to 
predict correctly the future level of seismicity beyond a few minutes. We find a strong 
predictability if one accepts to predict only a small fraction of the large-magnitude targets. 
Specifically, we find a prediction gain (defined as the ratio of the fraction of predicted events over 
the fraction of time in alarms) equal to 21 for a fraction of alarm of 1%, a target magnitude 
M > 6, an update time of 0.5 days between two predictions, and for realistic parameters of the 
ETAS model. However, the probability gains degrade fast when one attempts to predict a larger 
fraction of the targets. This is because a significant fraction of events remain uncorrelated from 
past seismicity. This delineates the fundamental limits underlying forecasting skills, stemming 
from an intrinsic stochastic component in these interacting triggered seismicity models. 
Quantitatively, the fundamental limits of predictability found here are only lower bounds of the 
true values corresponding to the full information on the spatial location of earthquakes. 
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1. Introduction 

There are several well documented facts in seis- 
micity: (1) spatial clustering of earthquakes at many 
scales, [e.g. Kagan and Knopoff, 1980] (2) the Gutenberg- 
Richter (GR) [Gutenberg and Richter, 1944] distri- 
bution of earthquake magnitudes and (3) clustering 
in time following large earthquakes, quantified by 
Omori's « l/t p law for aftershocks (with p « 1) 
[Omori, 1894]. However, there are some deviations 
from these empirical laws, and a significant variabil- 
ity in the parameters of these laws. The 6-value of 
the GR law, the p Omori exponent and the after- 
shock productivity are spatially and temporally vari- 
able [Utsu et al, 1995; Guo and Ogata, 1997]. Some 
alternative laws have also been proposed, such as a 
gamma law for the magnitude distribution [Kagan, 
1999] or the stretched exponential for the temporal 
decay of the rate of aftershocks [Kisslinger, 1993]. 

However, these "laws" are only the beginning of 
a full model of seismic activity and earthquake trig- 
gering. In principle, if one could obtain a faithful 
representation (model) of the spatio-temporal orga- 
nization of seismicity, one could use this model to 
develop algorithms for forecasting earthquakes. The 
ultimate quality of these forecasts would be limited 
by the quality of the model, the amount of data that 
can be used in the forecast and its reliability and pre- 
cision, and the stochastic component of seismic ac- 
tivity. Here, we analyze a simple model of seismic- 
ity known as the Epidemic Type Aftershock Sequence 
model (ETAS) [Ogata, 1988, 1989] and we use it to 
test the fundamental limits of predictability of this 
class of models. We restrict our analysis to the time 
domain, that is, we neglect the information provided 
by the spatial location of earthquakes which could be 
used to constrain the correlation between events and 
would be expected to improve forecasting skills. Our 
results should thus give lower bounds of the achievable 
predictive skills. This exercise is rather constrained 
but turns out to provide meaningful and useful in- 
sights. Because our goal is to estimate the intrinsic 
limits of predictability in the ETAS model, indepen- 
dently of the additional errors coming from the un- 
certainty in the estimation of the ETAS parameters 
in real data, we consider only synthetic catalogs gen- 
erated with the ETAS model. Thus, the only source 
of error of the prediction algorithms result from the 
stochasticity of the model. 

Before presenting the model and developing the 
tools necessary for the prediction of future seismicity, 



we briefly summarize in the next section the avail- 
able methods of earthquake forecasting based on past 
seismicity. In section 3, we present the model of in- 
teracting triggered seismicity used in our analysis. 
Section 4 develops the formal solution of the prob- 
lem of forecasting future seismicity rates based on 
the knowledge of past seismicity quantified by a cat- 
alog of times of occurrences and magnitudes of earth- 
quakes. Section 5 gives the results of an intensive se- 
ries of tests, which quantify in several alternative ways 
the quality of forecasts (regression of predicted ver- 
sus realized seismicity rate, error diagrams, probabil- 
ity gains, information-based binomial scores). Com- 
parisons with the Poisson null-hypothesis give a very 
significant success rate. However, only a small frac- 
tion of the large-magnitude targets can be shown to 
be successfully predicted while the probability gain 
deteriorates rapidly when one attempts to predict a 
larger fraction of the targets. We provide a detailed 
discussion of these results and of the influence of the 
model parameters. Section 6 concludes. 

2. A rapid tour of methods of 
earthquake forecasts based on past 
seismicity 

All the algorithms that have been developed for the 
prediction of future large earthquakes based on past 
seismicity rely on their characterization cither as wit- 
nesses or as actors. In other words, these algorithms 
assume that past seismicity is related in some way to 
the approach of a large scale rupture. 

2.1. Pattern recognition (M8) 

In their pioneering work, Keilis-Borok and Mali- 
novskaya [1964] codified an observation of general in- 
crease in seismic activity depicted with the only mea- 
sure, "pattern Sigma", which is characterizing the 
trailing total sum of the source areas of the medium 
size earthquakes. Similar strict codifications of other 
seismicity patterns, such as a decrease of b- value, an 
increase in the rate of activity, an anomalous num- 
ber of aftershocks, etc, have been proposed later and 
define the M8 algorithm of earthquake forecast (see 
[Keilis-Borok and Kossobokov, 1990a; Keilis-Borok 
and Soloviev, 2002] for useful reviews). In these al- 
gorithms, an alarm is defined when several precur- 
sory patterns are above a threshold calibrated in a 
training period. Predictions are updated periodi- 
cally as new data become available. Most of the 
patterns used by this class of algorithms are repro- 
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duced by the model of triggered seismicity known 
as the ETAS (Epidemic Type Aftershock Sequence) 
model [see Sornette and Sornette, 1999; Helmstetter 
and Sornette, 2002; Helmstetter et al, 2003]. The 
prediction gain G of the M8 algorithm, defined as the 
ratio between the fraction of predicted events over the 
fraction of time occupied by alarms, is usually in the 
range 3 to 10 (recall that a random predictor would 
give G = 1 by definition). A preliminary forward 
test of the algorithm for the time period July 1991 to 
June 1995 performed no better than the null hypoth- 
esis using a reshuffling of the alarm windows [Kos- 
sobokov et al., 1997]. Later tests indicated however a 
confidence level of 92% for the prediction of M7.5+ 
earthquakes by the algorithm M8-MSc for real-time 
intermediate-term predictions in the Circum Pacific 
seismic belt, 1992-1997, and above 99% for the predic- 
tion of M > 8 earthquakes [Kossobokov et al, 1999]. 
We use the term confidence level as 1 minus the prob- 
ability of observing a predictability at least as good as 
what was actually observed, under the null hypothe- 
sis that everything is due to chance alone. As of July 
2002, the scores (counted from the formal start of the 
global test initiated by this team since July 1991) are 
as follows: For M8.0+, 8 events occurred, 7 predicted 
by M8, 5 predicted by M8-MSc; for M7.5+, 25 events 
occurred, 15 predicted by M8 and 7 predicted by M8- 
MSc. 

2.2. Short term forecast of aftershocks 

Reasenberg and Jones [1989] and Wiemer [2000] 
have developed algorithms to predict the rate of af- 
tershocks following major earthquakes. The rate of 
aftershocks of magnitude m following an earthquake 
of magnitude M is estimated by the expression 



where b « 1 is the 6-value of the Gutenberg-Richtcr 
(GR) distribution. This approach neglects the contri- 
bution of seismicity prior to the mainshock, and does 
not take into account the specific times, locations and 
magnitudes of the aftershocks that have already oc- 
curred. In addition, this model (1) assumes arbitrar- 
ily that the rate of aftershocks increases with the mag- 
nitude M of the mainshock as ~ 10 fcM , which may not 
be correct. A careful measure of this scaling for the 
southern California seismicity gives a different scaling 
- 10 qM with a = 0.8 [Helmstetter, 2003]. Moreover, 
an analytical study of the ETAS model [Sornette and 
Helmstetter, 2002] shows that the case a > b leads 



to an explosive seismicity rate, which is unrealistic to 
describe the seismic activity. 

2.3. Branching models 

Simulations using branching models as a tool for 
predicting earthquake occurrence over large time hori- 
zons were proposed in Kagan [1973], and first imple- 
mented in Kagan and Knopoff [1977]. In a recent 
work, Kagan and Jackson [2000] use a variation of 
the ETAS model to estimate the rate of seismicity in 
the future but they neglect the seismicity that will be 
triggered between the present time and the horizon 
and which may dominate the future activity. There- 
fore, these predictions are only valid at very short 
times, when very few earthquakes have occurred be- 
tween the present and the horizon. 

To solve this problem and to extend the predictions 
further in time, Kagan and Jackson [2000] propose to 
use Monte-Carlo simulations to generate many pos- 
sible scenarios of the future seismic activity. How- 
ever, they do not use this method in their forecast- 
ing procedure. These Monte-Carlo simulations will 
be implemented in our tests, as we describe below. 
This method has already been tested by Vere-Jones 
[1998] to predict a synthetic catalog generated using 
the ETAS model. Using a measure of the quality of 
seismicity forecasts in terms of mean information gain 
per unit time, he obtains scores usually worse than 
the Poisson method. We use below the same class of 
model and implement a procedure taking into account 
the cascade of triggering. We find, in contrast with 
the claim of Vere-Jones [1998], a very strong prob- 
ability gain. We explain in section 5.6 the origin of 
this discrepancy, which can be attributed to the use 
of different measures for the quality of predictions. 

In [Helmstetter et al., 2003], the forecasting skills 
of algorithms based on three functions of the current 
and past seismicity (above a magnitude threshold) 
measured in a sliding window of 100 events have been 
compared. The functions are (i) the maximum mag- 
nitude M max of the 100 events in that window, (ii) 
the Gutenberg-Richter 6-value measured on these 100 
events by the standard Hill maximum likelihood es- 
timator and (iii) the seismicity rate A defined as the 
inverse of the duration of the window. For each func- 
tion, an alarm was declared for the target of an earth- 
quake of magnitude larger than 6 when the function 
is either larger (for M max and A) or smaller (for b) 
than a threshold. These functions M max , b and A are 
similar and in some cases identical to precursors and 
predictors that have been studied by other authors. 
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Helmstetter et al. [2003] found that these three pre- 
dictors are considerably better than those obtained 
for a random prediction, with the prediction based 
on the seismicity rate A being by far the best. This is 
a logical consequence of the model of interacting trig- 
gered seismicity used in Helmstetter et al. [2003] and 
also in the present work, in which any relevant physi- 
cal observable is a function of the seismicity rate. At 
least in the class of interacting triggered seismicity, 
the largest possible amount of information is recov- 
ered by targeting the seismicity rate. All other targets 
are derived from it as linear or non-linear transforma- 
tions of it. Our present study refines and extends the 
preliminary tests of Helmstetter et al. [2003] by us- 
ing a full model of seismicity rather than the coarse- 
grained measure A. We note also that the forecasting 
methods of Rundle et al. [2001; 2002] are based on 
a calculation of the coarse-grained seismicity above a 
small magnitude threshold, which is then projected 
into the future. 

3. The model of triggered seismicity 

The parametric form that defines the ETAS model 
used in this paper was formulated by Ogata [1985, 
1987, 1988]. See [Ogata, 1999] and [Helmstetter and 
Sornette, 2002a] for reviews of its origins, a descrip- 
tion of the different versions of the model and of its 
applications to model or predict seismic activity. It 
is important to stress that the ETAS model is not 
only a model of aftershock sequences as the acronym 
ETAS (Epidemic-Type Aftershock Sequence) would 
make one to believe but is fundamentally a model of 
triggered interacting seismicity. 

In addition to the strict definition of the ETAS 
model used by Ogata [1985, 1987, 1988, 1989, 1999], 
there were and still are a variety of alternative para- 
metric forms of the extended "mutually exciting point 
processes" with marks (that is, magnitudes) intro- 
duced by Hawkes [1971, 1972], which have been ap- 
plied to earthquakes, including Kagan and Knopoff 
[1987]; Kagan [1991] and Lomnitz [1974]. Kagan and 
Knopoff [1987] differs from Ogata [1985, 1987, 1988] 
in replacing the role played by the parameter c in the 
modified Omori law (1) by an abrupt cut-off which 
models the duration of the mainshock. They think 
that a non-zero value of c is merely the artifact of 
the missing events immediately after the mainshock 
In contrast, based on the observation of the records 
of seismic waves, Utsu [1970, 1992] considers that the 
parameter c is not merely due to such artifact but 



also possesses some physical meaning. The analysis 
of Helmstetter and Sornette [2002a] shows that the 
choice of a non-zero c value [Ogata, 1988] or an abrupt 
cut-off [Kagan and Knopoff, 1987] does not lead to 
any detectable differences in simulated catalogs at 
time scales beyond c (which is usually very small). 
Thus, from the point of view of the collective behav- 
ior of the model, both formulations lead to essentially 
indistinguishable catalogs and statistical properties. 
Lomnitz [1974] 's model ("Klondike model") was also 
directly inspired by Hawkes [1971] and is similar to 
the ETAS model, but assumes different parametric 
forms: in particular, the number of triggered events 
is taken proportional to the magnitude, not the expo- 
nential of the magnitude. Kagan and Jackson [2000] 
also use a formulation of the same class but with a 
more complex specification of the time, space and 
magnitude dependence of the triggering process and 
propagator. 

3.1. Definition of the ETAS model 

The ETAS model of triggered seismicity is defined 
as follows [Ogata, 1985; 1987; 1988; 1989; 1992; 1999]. 
We assume that a given event (the "mother" ) of mag- 
nitude mi > to occurring at time ti and position fj 
gives birth to other events ( "daughters" ) in the time 
interval between t and t + dt and at point r±dr at 
the rate 

(p mi (t-U,r-fi) = p( mi ) $(t-ti) <f>(r-n) . (2) 
&(t) is the direct Omori law normalized to 1 

where 6 > 0, H(t) is the Heaviside function, and c is 
a regularizing time scale that ensures that the seis- 
micity rate remains finite close to the mainshock. 

$(r — n) is a normalized spatial "jump" distribu- 
tion from the mother to each of her daughter, which 
quantifies the probability for a daughter to be trig- 
gered at a distance \r — fi\ from the mother, taking 
into account the spatial dependence of the stress in- 
duced by an earthquake. 

p(m) gives the total number of aftershocks trig- 
gered directly by an event of magnitude to 

p(m) = k I0 a ( m - m °) , (4) 

where to is a lower bound magnitude below which no 
daughter is triggered. The adjective "direct" refers 
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to the events of the first generation triggered in first- 
order lineage from the mother event. The formulation 
of (3) and (4) is originally due to Utsu [1970]. 

The model is complemented by the Gutenberg- 
Richter (GR) law which states that each earthquake 
has a magnitude chosen according to the density dis- 
tribution 

P(m) = b ln(10) i(r b ( m - m °) . (5) 

P{m) is normalized: P(m) dm = 1. When mag- 
nitudes are translated into energies, the GR law be- 
comes the (power law) Pareto law. See [Ogata, 1989; 
1992; 1999; Quo and Ogata, 1997; Helmstetter and 
Sornette, 2002a] for a discussion of the values of the 
ETAS parameters c, a, 9, K and b in real seismicity. 
Note however that we have reasons to think that pre- 
vious inversions of the ETAS parameters in real data 
are unreliable due to lack of concern for the unob- 
served seismicity below the completeness threshold. 
We are presently working on methods to address this 
question. 

In this first investigation, we limit ourselves to the 
time domain, studying time series of past seismicity 
summed over an overall spatial region, without tak- 
ing into account information on earthquake locations. 
This amounts to integrating the local Omori law (2) 
over the whole space. In the following, we will thus 
use the integrated form of the local Omori law (2) 
given by 

<p mi (t-ti) = J (f> mi {t-ti,f-fi) dr = p(m,i) $(t-ti) . 

r 

(6) 

The complete model with both spatial and tempo- 
ral dependence (2) has been studied in [Helmstetter 
and Sornette, 2002b] to derive the joint probability 
distribution of the times and locations of aftershocks 
including the whole cascade of secondary and later- 
generations aftershocks. When integrating the rate of 
aftershocks calculated for the spatio-temporal ETAS 
model over the whole space, we recover the results 
used in this paper for the temporal ETAS model. 

We stress that not taking into account the spa- 
tial positions of the earthquakes is not saying that 
earthquakes occur at the same position. The syn- 
thetic catalogs we generate in space and time are 
similar to real catalogs and our procedure just ne- 
glects the information on space. Clearly, this is not 
what a full prediction method should do and it is 
clear that not using the information on the location of 
earthquakes will underestimate (sometimes grossly) 



the predictive skills that could be achieved with a 
full spatio-temporal treatment. However, the prob- 
lem is sufficiently complex that we find it useful to go 
through this first step and develop the relevant con- 
cepts and first tests using only information on seismic 
time sequences. 

3.2. Definition of the average branching ratio 

n 

The key parameter of model (2) is the average 
number (or "branching ratio" ) n of daughter-earthquakes 
created per mother-event. This average is performed 
over time and over all possible mother magnitudes. 
This average branching ratio n is a finite value for 
9 > and for a < b, equal to 

oo oo 

/f kb 
dt I P(m) p(m) $(t) dm = - — -. (7) 

m 

The normal regime corresponds to the subcritical case 
n < 1 for which the seismicity rate decays after a 
mainshock to a constant level (in the case of a steady- 
state source) with fluctuations in the seismic rate. 

Since n is defined as the average over all mainshock 
magnitudes of the mean number of events triggered 
by a mainshock, the branching ratio does not give the 
number of daughters of a given earthquake, because 
this number also depends on the specific value of its 
magnitude as shown by (4). As an example, take 
a = 0.8, b = 1, too = and n = 1. Then, a mainshock 
of magnitude M — 7 will have on average 80000 direct 
aftershocks, compared to only 2000 direct aftershocks 
for an earthquake of magnitude M — 5 and less than 
0.2 aftershocks for an earthquake of magnitude M = 
0. 

The branching ratio defined by (7) is the key pa- 
rameter of the ETAS model, which controls the dif- 
ferent regimes of seismic activity. There are two ob- 
servable interpretations for this parameter [Helmstet- 
ter and Sornette, 2003]. The branching ratio can be 
defined as the ratio of triggered events over total seis- 
micity when looking at catalog of seismicity at large 
scale. The branching ratio is also equal to the ratio 
of the number of secondary and later-generations af- 
tershocks over the total number of aftershocks within 
a single aftershock sequence. 
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4. Formal solution of the earthquake 
forecast problem in the ETAS model 

Having stressed the importance of the indirect trig- 
gered seismicity in determining both the overall level 
of seismicity and its decay law, we now formulate the 
task of earthquake forecast within this model of trig- 
gered seismicity restricted to the time domain. In this 
paper, we do not address the delicate issue related to 
the fact that not all earthquakes are observable or ob- 
served. Indeed, calibrations of the ETAS parameters 
using the magnitude cut-offs dictated by the require- 
ment of seismic catalog completeness rather than by 
the physics of triggered seismicity may lead to mis- 
leading results, as unobserved events may play a sig- 
nificant role (in their sheer number) in triggering ob- 
servable seismicity. To our knowledge, all previous 
calibrations of real seismic catalogs have bypassed this 
problem, which will be studied using a technique de- 
rived from our renormalized Omori law in a subse- 
quent paper. 

We first summarize the single source prediction 
problem, which has been studied previously by Helm- 
stetter and Sornette [2002a]. We then consider the 
complete prediction problem and derive analytically 
the solution for the future seismicity rate triggered by 
all past event and by a constant external loading. 

4.1. Formulation of the global seismicity rate 
and renormalized Omori's law 

We define the "bare propagator" <p(t) of the seis- 
micity as the integral of (2) over all magnitudes 

oc 

4>{t) = j P{m) <j> m {t) dm = n$(t) , (8) 

ma 

which is normalized to n since &(t) is normalized 
to 1. The meaning of the adjective "bare" will be- 
come clear below, when we demonstrate that cascades 
of triggered events renormalize <p{t) into an effec- 
tive ("renormalized" or "dressed") propagator K{t). 
This terminology is borrowed from statistical and 
condensed-matter physics which deal with physical 
phenomena occurring at multiple scales in which sim- 
ilar cascades of fluctuations lead to a renormalization 
of "bare" into "dressed" properties when going from 
small to large scales. See also [Sornette and Sornette, 
1999; Helmstetter and Sornette, 2002a] where this ter- 
minology was introduced in the present context. 

The total seismicity rate A(t) at time t is given by 
the sum of an "external" source s(t) and the after- 



shocks triggered by all previous events 

\{t)=s(t)+ Mt-u) , (9) 

i\U<t 

where <j) mi (t — U) is defined by (2). Here, "exter- 
nal" source refers to the concept that s{t) is the rate 
of earthquakes not triggered by other previous earth- 
quakes. This rate acts as a driving force ensuring that 
the seismicity does not vanish and models the effect 
of the external tectonic forcing. 

Taking the ensemble average of (9) over many pos- 
sible realizations of the seismicity (or equivalently 
taking the mathematical expectation), we obtain the 
following equation for the first moment or statisti- 
cal average N(t) of A(t) [Sornette and Sornette, 1999; 
Helmstetter and Sornette, 2002a] 



N(t) = s(t) 



Z 

J 



(f>(t - r) N(t) dr. (10) 



The average seismicity rate is the solution of this self- 
consistent integral equation, which embodies the fact 
that each event may start a sequence of events which 
can themselves trigger secondary events and so on. 
The cumulative effect of all the possible branching 
paths of activity gives rise to the net seismic activity 
N(t). Expression (10) states that the seismic activity 
at time t is due to a possible external source s(t) plus 
the sum over all past times r of the total previous 
activities N(t) that may trigger an event at time t 
according to the bare Omori law 4>(t — r). 

The global rate of aftershocks including secondary 
and later-generations aftershocks triggered by a main- 
shock of magnitude M occurring at t — is given 
by p(M)K(t)/n, where the renormalized Omori law 
K(t) is obtained as a solution of (10) with the general 
source term s(t) replaced by the Dirac function S(t): 

t 

K(t) = S(t)+J <f>(t - t) K{t) dr . (11) 
o 

The solution for K(t) can be obtained as the following 
series [Helmstetter and Sornette, 2002a] 



(12) 

The infinite sum expansion is valid for t > c, and t* 
is a characteristic time measuring the distance to the 
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critical point n = 1 denned by 



t* = c 



nT(l-0)\V« 



(13) 



t* is infinite for n = 1 and becomes very small for 
n«l. The leading behavior of K(t) at short times 
reads 



K(t) 



1 



1 t* 



1 -n r(6») i 1 - 



for c < t < i* , (14) 



showing that the effect of the cascade of secondary 
aftershocks renormalizes the bare Omori law $(i) ~ 
l/t 1+e given by (3) into K(t) ~ l/t 1 - , as illustrated 
by Figure 1. 

Once the seismic response K (t) to a single event 
is known, the complete average seismicity rate N(t) 
triggered by an arbitrary source s(t) can be obtained 
using the theorem of Green functions for linear equa- 
tions with source terms [Morse and Feshbach, 1953] 



m = / 



s(t) K(t - t) dr. 



(15) 



Expression (15) provides the general solution of (10). 

4.2. The multiple source prediction problem 

We assume that seismicity which occurred in the 
past until the "present" time u, and which does trig- 
ger future events, is observable. The seismic catalog 
consists of a list of entries {(ti, rrii), ti < u} giving the 
times ti of occurrence of the earthquakes and their 
magnitude rm. Our goal is to set up the best possible 
predictor for the seismicity rate for the future from 
time u to time t > u, based on the knowledge of this 
catalog {(ti,mi), ti < u}. The time difference t — u is 
called the horizon. In the ETAS model studied here, 
magnitudes are determined independently of the seis- 
mic rate, according to the GR distribution. There- 
fore, the sole meaningful target for prediction is the 
seismic rate. Once its forecast is issued, the predic- 
tion of strong earthquakes is obtained by combining 
the GR law with the forecasted seismic rate. 

The average seismicity rate N(t) at time t > u in 
the future is made of two contributions: 

• the external source of seismicity of intensity [i at 
time t plus the external source of seismicity that 
occurred between u and t and their following 
aftershocks that may trigger an event at time t; 



• the earthquakes that have occurred in the past 
at times U < u and all the events they triggered 
between u and t and their following aftershocks 
that may trigger an event at time t. 

We now examine each contribution in turn. 

4.2.1. Seismicity at times t > u triggered 
by a constant source /j, active from u to t. Us- 
ing the external seismicity source /i to forecast the 
seismicity in the future would underestimate the seis- 
micity rate because it does not take into account the 
aftershocks of the external loading. On the contrary, 
using the "renormalizcd" seismicity rate /i/(l — n) de- 
rived in [Helmstetter and Sornette, 2003] would over- 
estimate the seismicity rate because the earthquakes 
that were triggered before time u by the external 
source would be counted twice, since they are reg- 
istered in the catalog up to time u. The correct pro- 
cedure is therefore to evaluate the rate of seismic- 
ity triggered by a constant source /i starting at time 
u to remove the influence of earthquakes that have 
been recorded at times less than u, whose influence 
for times larger than u is examined later. 

The response K^t) of the seismicity to a constant 
source term [i starting at time u is obtained using 
(15) as 

t 

K lt , u (t)= t i J [K(t-T)-6(t-r)} dT = nK{t-u), 

u+ 

(16) 

where IC(t) is the integral of K(t) — 8(t) given by (12) 
from the lower bound u excluded (noted u + ) to t. 
This yields 



£<-'>' 

fe=0 



(*/**) 



*\k9 



r((fc + 1)0 + 1) 



(17) 

For times larger that t 3> t* , K^{t) reaches the 
asymptotic value = j^—. Expression (16) takes 
care of both the external source seismicity of intensity 
[i at time t and of its aftershocks and their following 
aftershocks from time u to t that may trigger events 
at time t. 

4.2.2. Hypermetropic renormalized propa- 
gator. We now turn to the effect counted from time 
u of the past known events prior to time u on future 
t > u seismicity, taking into account the direct, sec- 
ondary and all later-generation aftershocks of each 
earthquakes that have occurred in the past at times 
ti < u. Since the ETAS model is linear in the rate 
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variable, we consider first the problem of a single past 
earthquake at time ti < u and will then sum over all 
past earthquakes. 

A first approach for estimating the seismicity at t > 
u due to event i that occurred at time ti < u is to use 
the bare propagators <fr{t — ti), as done e.g. by Kagan 
and Jackson [2000]. This extrapolation leads to an 
underestimation of the seismicity rate in the future 
because it does not take into account the secondary 
aftershocks. This is quite a bad approximation when 
n is not very small, and especially for n > 0.5, since 
the secondary aftershocks are then more numerous 
than direct aftershocks. 

An alternative would be to express the seismicity 
at t > u due to an earthquake that occurred at U < u 
by the global propagator K(t — t»). However, this 
approach would overestimate the seismicity rate at 
time t because of double counting. Indeed, K(t — ti) 
takes into account the effect of all events triggered by 
event i, including those denoted j's that occurred at 
times ti < tj < u and which are directly observable 
and counted in the catalog. Thus, using K(t — ti) 
takes into account these events j, that are themselves 
part of the sum of contributions over all events in the 
catalog. 

The correct procedure is to calculate the seismicity 
at t > u due to event i by including all the seismic- 
ity that it triggered only after time u. This defines 
what we term the "hypermetropic renormalized prop- 
agator" K*(t — ti). It is "renormalized" because it 
takes into account secondary and all subsequent after- 
shocks. It is "hypermetropic" because this counting 
of triggered seismicity starts only after time u such 
that this propagator is oblivious to all the seismicity 
triggered by event i at short times from ti to u. 

We now apply these concepts to estimate the seis- 
micity triggered directly or indirectly by a mainshock 
with magnitude M that occurred in the past at time 
ti while removing the influence of the triggered events 
j occurring between U and u. This gives the rate 



S M ( t ) = ^l K*{t-U) 



(18) 



where the hypermetropic renormalized propagator 
K* is given by 



Kit) 



V 

I 



(t) K(t - t) dr 



(19) 



K*(t) defined by (19) recovers the bare propagator 
3>(t) for t « u, i.e., when the rate of direct aftershocks 



dominates the rate of secondary aftershocks triggered 
at time t > u. Indeed, taking the limit of (19) for 
u — > t gives 



K^t(t) 



I 



0(r) K(t - t) dr 



(r) 5(t -t) dr = <f>(t) . (20) 



This result simply means that the forecasting of fu- 
ture seismicity in the near future is mostly dominated 
by the sum of the direct Omori laws of all past events. 
This limit recovers procedures used by Kagan and 
Jackson [2000]. 

In the other limit, u w ti, i.e., for an event that 
occurred at a time ti just before the present u, K*(t) 
recovers the dressed propagator K(t) (up to a Dirac 
function) since there are no other registered events 
between ti and t and all the seismicity triggered by 
event i must be counted. Using equation (11), this 
gives 

K^ (t)= J <P(t) K(t—r) dr = K(t)—6(t) . (21) 
Using again (11), we can rewrite (19) as 

u 

K* u {t) = K{t)- j ' K(t-T)4>{r)dr . (22) 



Putting (12) in (19) we obtain for t > c 

t — u 

Kit) 



T(9) r(i - 6>) 



/ 1 

J {t-xi 



+ c) i+e x i-e 



EM* 



(x/t*) 



k() 



m+i)6) 



dx 



(23) 



fe=0 

where x = t — t. 

We present below useful asymptotics and approxi- 
mations of K* (t). 

• Hypermetropic renormalized propagator for t <C 

t* 

Putting the asymptotic expansion of K{t) for 
t < t* (14) in (19) we obtain for t > c and 

t > u 



K<t<At) 



i 



(t + c-u) b 



T(0) r(l-0) (u + c) (t + c) ' 

(24) 
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which recovers K(t) for u = 0. 

• Hypermetropic renormalized propagator for t S> 
u 

In the regime t » u, we can rewrite (22) as 

^(i) « K(t)-K(t) / 0(r)dr 

Jo 

- *(*)(! -n(l-^))25) 

• Hypermetropic renormalized propagator for t « 
u 

In the regime t ~ u and f - ii » c, we can 
rewrite (19) as 

t 

K(t) « 0(t) y K(t-r)dr 

U 

« 0(t) (/C(i - u) + 1) , (26) 

where /C(t) is the integral of if (t) — #(t) given 
by (17). The additional 1 in (26) comes from 
the Dirac 8(t) in the expression (11) of K(t). 

We have performed numerical simulations of the 
ETAS model to test our predictions on the hyperme- 
tropic renormalized propagator K*(t) (19,23). 

For the unrealistic case where a = 0, i.e., all events 
trigger the same number of aftershocks whatever their 
magnitude, the simulations show a very good agree- 
ment (not shown) between the results obtained by 
averaging over 1000 synthetic catalogs and the theo- 
retical prediction (19,23). 

Figure 2 compares the numerical simulations with 
our theoretical prediction for more realistic parame- 
ters a = 0.5 with n = 1, c = 0.001 day, 6 = 0.2 
and fi = 0. In the simulations, we construct 1,000 
synthetic catalogs, each generated by a large event 
that happened at time t = 0. The numerical hyper- 
metropic renormalized propagator or seismic activity 
K* (t) is obtained by removing for each catalog the in- 
fluence of aftershocks that were triggered in the past 
< ti < u where the present is taken equal to u = 10 
days and then by averaging over the 1,000 catalogs. 
It can then be compared with the theoretical predic- 
tion (19,23). Figure 2 exhibits a very good agree- 
ment between the realized hypermetropic seismicity 
rate (open circles) and K*(t) predicted by (19) and 
shown as the continuous line up to times t— u ~ 10 3 u. 
The hypermetropic renormalized propagator K*(t) is 



significantly larger than the bare Omori law $(£) but 
smaller than the renormalized propagator K{t) as ex- 
pected. Note that K*(t) first increases with the hori- 
zon t — u up to horizons of the order of u and then 
crosses over to a decay law K*(t) ~ l/t 1 ^ 6 parallel to 
the dressed propagator K(i). At large times however, 
one can observe a new effect in the clear deviation be- 
tween our numerical average of the realized seismicity 
and the hypermetropic seismicity rate K* (t) . This de- 
viation pertains to a large deviation regime and is due 
to a combination of a survival bias and large fluctu- 
ations in the numerics. For the larger value a = 0.8, 
which may be more relevant for seismicity, the devi- 
ation between the simulated seismicity rate and the 
prediction is even larger. Indeed, for a > 1/2, Helm- 
stetter et al. [2003] have shown that the distribu- 
tion of first-generation seismicity rate is a power law 
with exponent less than or equal to 2, implying that 
its variance is ill-defined (or mathematically infinite). 
Thus, averages converge slowly, all the more so at 
long times where average rates are small and fluctu- 
ations are huge. Another way to state the problem is 
that, in this regime a > 1/2, the average seismicity 
may be a poor estimation of the typical or most prob- 
able seismicity. Such effect can be taken into account 
approximately by taking into account the coupling 
between the fluctuations of the local rates and the re- 
alized magnitudes of earthquakes at a coarse-grained 
level of description [Helmstetter et al., 2003]. A de- 
tailed analytical quantification of this effect for K(t) 
(already discussed semi-quantitatively in [Helmstet- 
ter et al., 2003]) and for K*(t), with an emphasis on 
the difference between the average, the most probable 
and different quantiles of the distribution of seismic 
rates, will be reported elsewhere. 

Since we are aiming at predicting single catalogs, 
we shall resort below to robust numerical estimations 
of K(t) and K*(t) obtained by generating numerically 
many seismic catalogs based on the known seismicity 
up to time u. Each such catalog synthesized for times 
t > u constitutes a possible scenario for the future 
seismicity. Taking the average and calculating the 
median as well as different quantiles over many such 
scenarios provides the relevant predictions of future 
seismicity for a single typical catalog as well as its 
confidence intervals. 

5. Forecast tests with the ETAS model 

Because our goal is to estimate the intrinsic limits 
of predictability in the ETAS model, independently of 
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the additional errors coming from the uncertainty in 
the estimation of the ETAS parameters in real data, 
we consider only synthetic catalogs generated with 
the ETAS model. Testing the model directly on real 
seismicity would amount to test simultaneously sev- 
eral hypotheses/questions: (i) ETAS is a good model 
of real seismicity, (ii) the method of inversion of the 
parameters is correctly implemented and stable, what 
is the absolute limit of predictability of (iii) the ETAS 
model and (iv) of real seismicity? Since each of these 
four points are difficult to address separately and is 
not solved at all, we use only synthetic data generated 
with the ETAS model to test the intrinsic predictive 
skills of the model (iii), independently of the other 
questions. Our approach thus parallels several previ- 
ous attempts to understand the degree and the limits 
of predictability in models of complex systems, de- 
veloped in particular as models of earthquakes (see of 
instance [Pepke and Carlson, 1994; Pepke et at, 1994; 
Gabnelov et al, 2000]). 

Knowing the times tj and magnitude uii of all 
events that occurred in the past up to the present u, 
the mean seismicity rate N u (t) forecasted for the fu- 
ture t > u by taking into account all triggered events 
and the external source fi is given formally by 

N u {t)=»1C{t-u)+ £ ^^K(t-U), (27) 

n 

Z\ti<U 

where K*(t — ti) is given by (19) and fC(t) is given 
by (17). In the language of the statistics of point 
processes, expression (27) amounts to using the con- 
ditional intensity function. The conditional intensity 
function gives an unequivocal answer to the question 
of what is the best predictor of the process. All future 
behaviors of the process, starting from the present 
time u and conditioned by the history up to time u, 
can be simulated exactly once the form of the condi- 
tional intensity is known. To see this, we note that the 
conditional intensity function, if projected forward on 
the assumption that no additional events are observed 
(and assuming no external variables intervene), gives 
the hazard function of the time to the next occurring 
event past u. So if the simulation proceeds by us- 
ing this form of hazard function, then by recording 
the time of the next event when it does occur, and 
so on, ensures that one is always working with the 
exact formula for the conditional distributions of the 
inter-event times. The simulations then truly repre- 
sent the future of the process, and any functional can 
be taken from them in whatever form is suitable for 
the purpose at hand. 



In practice, we thus use the catalog of known earth- 
quakes up to time u and generate many different pos- 
sible scenarios for the seismicity trajectory which each 
take into account all the relevant past triggered seis- 
micity up to the present u. For this, we use the 
thinning simulation method, as explained by Ogata 
[1999]. We then define the average, the median and 
other quantiles over these scenarios to obtain the fore- 
casted seismicity N u (t). 

5.1. Fixed present and variable forecast 
horizon 

Figure 3 illustrates the problem of forecasting the 
aftershock seismicity following a large M = 7 event. 
Imagine that we have just witnessed the M = 7 event 
and want to forecast the seismic activity afterward 
over a varying horizon from days to years in the fu- 
ture. In this simulation, u is kept fixed at the time 
just after the M — 7 event and t is varied. A real- 
ization of the instantaneous rate of seismic activity 
(number of events per day) of a synthetic catalog is 
shown by the black dots. This simulation has been 
performed with the parameters n = 0.8, a — 0.8, 
b = 1, c = 0.001 day, m = 3 and \i = 1 event per day. 
This single realization is compared with two forecast- 
ing algorithms: the sum of the bare propagators of all 
past events ti < u (crosses), and the median of the 
seismicity rate obtained over 500 scenarios generated 
with the ETAS model, using the same parameters as 
used for generating the synthetic catalog we want to 
forecast, and taking into account the specific realiza- 
tion of events in each scenario up to the present. Fig- 
ure 4 is the same as Figure 3 but shows the seismic 
activity as a function of the logarithm of the time af- 
ter the mainshock. These two figures illustrate clearly 
the importance of taking into account all the cascades 
of still unobserved triggered events in order to fore- 
cast correctly the future rate of seismicity beyond a 
few minutes. The aftershock activity forecast gives a 
very reasonable estimation of the future activity rate, 
while the extrapolation of the bare Omori law of the 
strong M = 7 event together with the past seismic- 
ity very badly under- estimates the future seismicity 
beyond half-an-hour after the strong event. 

5.2. Varying "present" with fixed forecast 
horizon 

Figure 5 compares a single realization of the seis- 
micity rate (open circles) observed and summed over 
a 5 days period and divided by 5 so that it is expressed 
as a daily rate, with the predicted seismicity rate us- 
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ing either the sum of the bare propagators of the past 
seismicity (dots) or using the median of 100 scenar- 
ios (crosses) generated with the same parameters as 
for the synthetic catalog we want to forecast: n = 0.8, 
c = 0.001 day, [i — 1 event per day, m = 3,6 = 1 and 
a = 0.8. The forecasting methods calculate the total 
number of events over each 5 day period lying ahead 
of the present, taking into account all past seismic- 
ity including the still unobserved triggered seismicity. 
This total number of forecasted events is again di- 
vided by 5 to express the prediction as daily rates. 
The thin solid lines indicate the first and 9 th deciles 
of the distributions of the number of events observed 
in the pool of 100 scenarios. Stars indicate the oc- 
currence of large M > 7 earthquakes. Only a small 
part of the whole time period used for the forecast is 
shown, including the largest M = 8.5 earthquake of 
the catalog, in order to illustrate the difference be- 
tween the realized seismicity rate and the different 
methods of forecasting. 

The realized seismicity rate is always larger than 
the seismicity rate predicted using the sum of the bare 
propagators of the past activity. This is because the 
seismicity that will occur up to 5 days in the future 
is dominated by the seismicity that will be triggered 
in the near future that is still unobserved but must 
be taken into account. The realized seismicity rate is 
close to the median of the scenarios (crosses), and the 
fluctuations of the realized seismicity rate are in good 
agreement with the expected fluctuations measured 
by the deciles of the distributions of the seismicity 
rate over all scenarios generated. 

Figure 6 compares the predictions of the seismicity 
rate over a 5 day horizon with the seismicity of a typ- 
ical synthetic catalog; a small fraction of the history 
was shown in Figure 5. This comparison is performed 
by plotting the predicted number of events in each 5 
day horizon window as a function of the actual num- 
ber of events. The open circles (crosses) correspond 
to the forecasts using the median of 100 scenarios 
(the sum of the bare Omori propagators of the past 
seismicity). This Figure uses a synthetic catalog of 
N = 200000 events of magnitude larger than m n = 3 
covering a time period of 150 yrs. The dashed line cor- 
responds to the perfect prediction when the predicted 
seismicity rate is equal to the realized seismicity rate. 
This Figure shows that the best predictions are ob- 
tained using the median of the scenarios rather than 
using the bare propagator, which always underesti- 
mates the realized seismicity rate, as we have already 
shown. 



The most striking feature of Figure 6 is the exis- 
tence of several clusters, reflecting two mechanisms 
underlying the realized seismicity. 

1. cluster LL with large predicted seismicity and 
large realized seismicity; 

2. cluster SL with small predicted seismicity and 
large realized seismicity; 

3. cluster SS with small predicted seismicity and 
small realized seismicity; 

Cluster LL along the diagonal reflects the predictive 
skill of the triggered seismicity algorithm: this is when 
future seismicity is triggered by past seismicity. Clus- 
ter SL lies horizontally at low predicted seismicity 
rates and shows that large seismicity rates can also 
be triggered by an unforecasted strong earthquake, 
which may occur even when the seismicity rate is low. 
This expresses a fundamental limit of predictability 
since the ETAS model permits large events even for 
low prior seismicity, as the earthquake magnitudes are 
drawn from the GR independently of the seismic rate. 
About 20% of the large values of the realized seismic- 
ity rate above 10 events per day fall in the LL cluster, 
corresponding to a predictability of about 20% of the 
large peaks of realized seismic activity. The cluster 
SS is consistent with a predictive skill but small seis- 
micity is not usually an interesting target. Note there 
is no cluster of large predicted seismicity associated 
with small realized seismicity. 

Figure 7 is the same as Figure 5 for a longer time 
window of 50 days, which stresses the importance of 
taking into account the yet unobserved future seismic- 
ity in order to accurately forecast the level of future 
seismicity. Figure 8 is the same as Figure 6 for the 
forecast time window of 50 days with forecasts up- 
dated each 50 days. Increasing the time window T 
of the forecasts from 5 to 50 days leads to a smaller 
variability of the predicted seismicity rate. However, 
fluctuations of the seismicity rate of one order of mag- 
nitude can still be predicted with this model. The 
ETAS model therefore performs much better than a 
Poisson process for large horizons of 50 days. 

5.3. Error diagrams and prediction gains 

In order to quantify the predictive skills of differ- 
ent prediction algorithms for the seismicity of the next 
five days, we use the error diagram [Molchan, 1991; 
1997; Molchan and Kagan, 1992]. The predictions are 
made from the present to 5 days in the future and are 
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updated each 0.5 day. Using a shorter time between 
each prediction, or updating the prediction after each 
major earthquake, will obviously improve the predic- 
tions, because large aftershocks occur often just after 
the mainshock. But in practice the forecasting pro- 
cedure is limited by the time needed to estimate the 
location and magnitude of an earthquake. Moreover, 
predictions made for very short times in advance (a 
few minutes) are not very useful. 

An error diagram requires the definition of a target, 
for instance M > 6 earthquakes, and plots the frac- 
tion of targets that were not predicted as a function 
of the fraction of time occupied by the alarms (total 
duration of the alarms normalized by the duration of 
the catalog). We define an alarm when the predicted 
seismic rate is above a threshold. Recall that in our 
model the seismic rate is the physical quantity that 
embodies completely all the available information on 
past events. All targets one might be interested in 
derive from the seismic rate. 

Figure 9 presents the error diagram for M > 6 
targets, using a time window T = 5 days to esti- 
mate the seismicity rate, and a time dT = 0.5 days 
between two updates of the predictions. We use dif- 
ferent prediction algorithms, either the bare propaga- 
tor (dots), the median (circles) or the mean (trian- 
gles) number of events obtained for the 100 scenarios 
already generated to obtain Figures 5 and 6. Each 
point of each curve corresponds to a different thresh- 
old ranging from 0.1 to 1000 events per day. The 
results for these three prediction algorithms are con- 
siderably better than those obtained for a random 
prediction, shown as a dashed line for reference. 

Ideally, one would like the minimum numbers of 
failures and the smallest possible alarm duration. 
Hence, a perfect prediction corresponds to points 
close to the origin. In practice, the fraction of fail- 
ure to predict is 100% without alarms and the gain 
of the prediction algorithm is quantified by how fast 
the fraction of failure to predict decreases from 100% 
as the fraction of alarm duration increases. Formally, 
the gain G reported below is defined as the ratio of the 
fraction of predicted targets (=1— number of failures 
to predict) divided by the fraction of time occupied by 
alarms. A completely random prediction corresponds 
to G = 1. 

We observe that about 50% of the M > 6 earth- 
quakes can be predicted with a small fraction of alarm 
duration of about 20%, leading to a gain of 2.5 for this 
value of the alarm duration. The gain is significantly 
stronger for shorter fractions of alarm duration: as 



shown in panel (b) of Figure 9, 25% of the M > 6 
earthquakes can be predicted with a small fraction of 
alarm duration of about 2%, leading to a gain of 12.5. 
The origin of this good performance for only a frac- 
tion of the targets has been discussed in relation with 
Figure 6, and is associated with those events that oc- 
cur in times of large seismic rate (cluster along the 
diagonal in Figure 6). Panel (c) of Figure 9 shows the 
dependence of the prediction gain G as a function of 
the alarm duration: the three prediction schemes give 
approximately the same power law increase with an 
exponent close to 1/2 of the gain as the duration of 
alarm decreases. For small alarm duration, the gain 
reaches values of several hundreds. The saturation 
at very small values of the alarm duration is due to 
the effect that only a few targets are sampled. Fig- 
ures 10 and 11 are similar to Figure 9, respectively 
for a smaller target of magnitudes larger than 5 and 
a larger target of magnitudes larger than 7. 

Table 1 presents the results for the prediction gain 
and for the number of successes using different choices 
of the time window T and of the update time dT be- 
tween two predictions, and for different values of the 
target magnitude between 5 and 7. The prediction 
gain decreases if the time between two updates of the 
prediction increases, because most large earthquakes 
occur at very short times after a previous large earth- 
quake. In contrast, the prediction gains do not de- 
pend on the time window T for the same value of the 
update time dT. 

The prediction gain is observed to increase signif- 
icantly with the target magnitude, especially in the 
range of small fraction of alarm durations (see Ta- 
ble 1 and Figures 9-11). However, this increase of 
the prediction gain does not mean that large earth- 
quakes are more predictable than smaller ones, in con- 
trast with, for example, the critical earthquake theory 
[Sornette and Sammis, 1995; Jaume and Sykes, 1999; 
Sammis and Sornette, 2002] . In the ETAS model, the 
increase of the prediction gain with the target mag- 
nitude is due to the decrease of the number of target 
events with the target magnitude. Indeed, choosing 
N events at random in the catalog independently of 
their magnitude gives on average the same prediction 
gain as for the N largest events. This demonstrates 
that the larger predictability of large earthquakes is 
solely a size effect. 

We now clarify the statistical origin of this size ef- 
fect. Let us consider a catalog of total duration D 
with a total number N of events analyzed with D/T 
time windows with horizon T. These D/T windows 
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can be sorted by decreasing seismicity n > r 2 > ... > 
r n > where ri is the i-th largest number of events 
in a window of size T. There are ni, ri2, rii, ... 
windows of type 1, 2, i, ... respectively such that 
Y]. ri rii — N . Then, the frequency-probability that 
an earthquake drawn at random from the catalog falls 
within a window of type i is 



We have found that, over more than three decades 
spanning from 1 event to 10 3 — 10 4 events per window, 
the cumulative distribution of these p^s is a power law 
with exponent approximately equal to k = 0.4. This 
power law is found for the observable realized seis- 
micity as well as for the seismicity predicted by the 
different methods discussed above. Such a small ex- 
ponent k implies that the few windows that happen 
to have the largest number of events contain a signif- 
icant fraction of the total seismicity. It can be shown 
[Feller, 1971] that, in absence of any constraint, the 
single window with the largest number of events con- 
tains on average 1 — k = 60% of the total seismic- 
ity. This would imply that there is a 60% probability 
that an earthquake drawn at random within the cat- 
alog (of 150 years) belongs to this single window of 
5 days. This effect is moderated because the random 
variables pi have to sum up to 1 by definition. This 
can be shown to entail a roll-off of the cumulative 
distribution depleting the largest values of pi. Em- 
pirically, we find that the most active window out of 
the approximately 54, 000 daily windows of our 150 
years long catalog contains only 3% of the total num- 
ber of events. While this value of 3% is smaller than 
the prediction of 60% in absence of normalization, 
it is considerably larger than the "democratic" re- 
sult which would predict a fraction of about 0.002% 
of the seismicity in each window. Since a high seis- 
micity rate implies strong interactions and triggering 
between earthquakes and is usually associated with 
recent past high seismicity; the events in such a win- 
dow are highly predictable. When the number of tar- 
gets increases, one starts to sample statistically other 
windows with smaller seismicity which have a weaker 
relation with triggered seismicity and thus present less 
predictive power. 

In our previous discussion, we have not distin- 
guished the skills of the three algorithms, because 
they perform essentially identically with respect to 
the assigned targets. This is very surprising from 
the perspective offered by all our previous analysis 
showing that the naive use of the direct Omori law 



without taking into account the effect of any indi- 
rect triggered seismicity strongly underestimate the 
future seismicity. We should thus expect a priori that 
this prediction scheme should be significantly worse 
than the two others based on a correct counting of all 
unobserved triggered seismicity. The explanation for 
this paradox is given by examining Figure 12, which 
presents further insight in the prediction methods ap- 
plied to the synthetic catalogs used in Figures 3-11. 
Figure 12 shows three quantities as a function of the 
threshold in seismicity rate used to define an alarm, 
for each of the three algorithms. These quantities are 
respectively the duration of alarms normalized by the 
total duration of the catalog shown in panel (a), the 
fraction of successes (=1— failure to predict) shown 
in panel (b) and the prediction gain shown in panel 
(c) . These three panels tell us that the incorrect level 
of seismic activity predicted by the bare Omori law 
approach can be compensated by the use of a lower 
alarm threshold. In other words, even if the seismic- 
ity rate predicted by the bare Omori law approach is 
wrong in absolute values, its time evolution in rela- 
tive terms contains basically the same information as 
the full-fledged method taking into account all unob- 
served triggered seismicity. Therefore, an algorithm 
that can detect a relative change of seismicity can per- 
form as well as the complete approach for the forecast 
of the assigned targets. This is an illustration of the 
fact that predictions of different targets can have very 
different skills which depend on the targets. Using 
the full-fledged renormalized approach is the correct 
and only method to get the best possible predictor 
of future seismicity rate. However, other simpler and 
more naive methods can perform almost as well for 
more restricted targets, such as the prediction of only 
strong earthquakes. 

5.4. Optimization of earthquake prediction 

The optimization of the prediction method requires 
the definition of a loss function 7, which should be 
minimized in order to determine the optimum alarm 
threshold of the prediction method. The probability 
gain defined in the previous section cannot be used 
as an optimization method, because it is maximum 
for zero alarm time. The strategies that optimize the 
probability gain are thus very impractical. An error 
function commonly used [e.g., Molchan and Kagan, 
1992] is the sum of the two types of errors in earth- 
quake prediction, the fraction of alarm duration r and 
the fraction of missed events v. This loss function is 
illustrated in Figure 13 for the same numerical simu- 
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lation of the ETAS model as in the previous section, 
using a prediction time window of 5 days and an up- 
date time of 0.5 day, for different values of the target 
magnitude between 5 and 7. For a prediction algo- 
rithm that has no predictive skill, the loss function 
7 should be close to 1, independently of the alarm 
duration r. This is indeed what we observe for the 
prediction algorithm using a Poisson process, with a 
constant seismicity rate equal to the average seismic- 
ity rate of the realized simulation. For the prediction 
methods based on the ETAS model, we obtain a sig- 
nificant predictability by comparison with the Poisson 
process (Figure 13). The loss function is minimum 
for a fraction of alarm duration of about 10%, and 
decreases with the target magnitude M t from 0.9 for 
M t = 5 down to 0.7 for M t = 7. We expect that the 
minimum loss function will be much lower when using 
the information on earthquake locations. 

5.5. Influence of the ETAS parameters on 
the predictability 

We now test the influence of the model parame- 
ters a, n, mo and p on the predictability. We did 
not test the influence of the b- value, because this pa- 
rameter is rather well constrained in seismicity, and 
because its influence is felt only relative to a. We 
keep c equal to 0.001 day because this parameter is 
not critical as long as it is small. The external loading 
fi is also fixed to 1 event/day because it is only a mul- 
tiplicative factor of the global seismicity. The value of 
the minimum magnitude mo above which earthquakes 
may trigger aftershocks of magnitude larger than mo 
is very poorly constrained. This value is no larger 
than 3 for the Southern California seismicity, because 
there are direct evidences of M3+ earthquakes trig- 
gered by M — 3 earthquakes [Helmstetter, 2003]. We 
are limited in our exploration of small values of m 
because the number of earthquakes increases rapidly 
if mo decreases, and thus the computation time be- 
comes prohibitive. We have tested the values m = 1 
and mo = 2 and find that the effect of a decrease of 
m is simply to multiply the seismicity rate obtained 
for M > 3 by a constant factor. Using a smaller m 
does not change the temporal distribution of seismic- 
ity and therefore does not change the predictability 
of the system. 

We use the minimum value of the error function 
7 introduced in the previous section to characterize 
the predictability of the system. This function is es- 
timated from the seismicity rate predicted using the 
bare propagator, as done in previous studies [Kagan 



and Knopoff, 1987; Kagan and Jackson, 2000]. While 
all three prediction methods that we have investigated 
give approximately the same results for the error di- 
agram, the method of the bare propagator is much 
faster, which justifies to use it. The results are sum- 
marized in Table 2, which gives the minimum value 
of the error function 7 for each set of parameters, and 
the corresponding values of the alarm duration, the 
proportion of predicted M > 6 events and the pre- 
diction gain. All values of 7 arc in the range 0.6-0.9, 
corresponding to a small but significant predictabil- 
ity of the ETAS model. The predictability increases 
(i.e, 7 decreases) with a, n and p, because for large a, 
large n and/or large p, there are larger fluctuations 
of the seismicity rate which have stronger impacts for 
the future seismicity. The minimum magnitude mo 
has no significant influence on the predictability. We 
recover the same pattern when using another estimate 
of the predictability measured by the prediction gain 
Gi% for an alarm duration of 1%. 

5.6. Information gain 

We now follow Kagan and Knopoff [1977], who in- 
troduced the entropy/information concept linking the 
likelihood gain to the entropy per event and hence to 
the predictive power of the fitted model, and Vere- 
Jones [1998], who suggested using the information 
gain to compare different models and to estimate the 
predictability of a process. 

Our forecasting algorithm provides the average 
seismicity rate Xi above mo in the time interval 
(ti, ti+T). Assuming a constant magnitude distribu- 
tion given by (5), the probability pi to have at least 
one event above the target magnitude M t in the time 
interval (t i} ti + T) can be evaluated from the average 
seismicity rate Ai by 



Pi = 1 - exp(-Aj l(T b ( M <- m °)) 



(29) 



Figure 14 shows the probability pi obtained for differ- 
ent choices of the target magnitude M t for the same 
sequence as in Figure 5. 

The binomial score B compares the prediction pi 
with the realization Xi, with Xi = 1 if a target event 
occurred in the interval (ti ,ti + T) and Xi = other- 
wise. For the whole sequence of intervals (ti, ti + T), 
the binomial score is defined by 

B = Y^[Xilog(p i ) + (l-X i )log(l-p i )\ , (30) 

i 

where the sum is performed over all (non-overlapping) 
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time windows covering the whole duration of the cat- 
alog. The first term represents the contribution to the 
score from those intervals which contain an event, and 
the second term the contribution to the score from 
those intervals which contain no event. 

In order to test the performance of a forecasting al- 
gorithm, we compare the binomial score B of the fore- 
cast with the binomial score of a Poisson process. We 
use two definitions of the Poisson process. First, we 
use the same definition as in [Vere- Jones, 1998] and 
take a Poisson process with a seismicity rate equal 
to the average seismicity rate of the realized cata- 
log and use equation (29) to estimate the probability 
of having at least an event above the target magni- 
tude in each time interval. Because this definition of 
the probability assumes a uniform temporal distribu- 
tion of target events and thus neglects clustering of 
large events, it over-estimates the proportion of in- 
tervals which have at least one target event. Indeed, 
the probability of having several events of magnitude 
M > M t in the same time window is much higher for 
the ETAS model than for a Poisson process. We thus 
propose another definition of the Poisson process de- 
fined by putting all values of pt equal to the fraction 
of intervals in the realized catalog that have at least 
a target events. For small time intervals and/or large 
target magnitudes, the proportion of intervals that 
have several target events is very small, and the two 
definitions of the Poisson process give similar results. 
The results for different choices of the time interval T 
and of the target magnitude M t are listed in Table 3. 

We evaluate the binomial score B for different pre- 
diction algorithms: (i) the average of the seismicity 
rate over all scenarios (B mean ), (ii) the exponential 
of the average of the logarithm of the mean of the 
seismicity rate (B mean i), (iii) the median of the seis- 
micity rate (B mec i) and (iv) the sum of the bare prop- 
agators of the past seismicity (B^). The results for 
the forecasting methods based on the median seis- 
micity rate (B me d and B mean i) are in general better 
than the Poisson process, i.e., the binomial score for 
the forecasting algorithms are larger than the score 
obtained with a Poisson process for both definitions 
of the Poisson process. The binomial score B po i S 2 
measured using the second definition of the Poisson 
process is higher than the binomial score B pois i for 
the first definition of the Poisson process because the 
first method is biased and over-estimates the proba- 
bility of having at least a target event. The scores of 
the forecasting methods which take into account the 
cascade of secondary aftershocks (B me d, B mean and 



Bmeani in Table 3) are significantly better than the 
score Bff, obtained with the bare propagator, even for 
short time intervals T, because this predictor under- 
estimates the realized seismicity rate. Similarly, the 
score B mean obtained for the average seismicity rate 
is generally smaller than B me d because the averaging 
method generally over-estimates the realized seismic- 
ity rate. For large time intervals T > 10 days, and 
for large target magnitudes, the results for the bare 
propagator are even worse than the results obtained 
with a Poisson process using both definitions of the 
Poisson process. 

Our results arc in disagreement with those re- 
ported in Vere- Jones [1998] on the same ETAS model: 
we conclude that the ETAS model has a signifi- 
cantly higher predictive power than the Poisson pro- 
cess while Vere- Jones [1998] concludes that the fore- 
casting performance of the ETAS model is worse than 
with the Poisson model. Vere-Jones and Zhuang's 
procedure and ours are very similar. They use the 
same method to generate ETAS simulations and to 
update the predictions at rigid time intervals, with a 
similar time between two updates of the predictions. 
They use the same method to estimate the probabil- 
ity of having a target event for the Poisson process 
(corresponding to our first definition of the Poisson 
process) but a different method to estimate the prob- 
ability pi for the ETAS model. Rather than using eq. 
(29) to derive the probability pi from the seismicity 
rate as done in this work, they measure directly this 
probability from the fraction of scenarios which have 
at least a target event. We have compared the two 
methods and found that the method of Vere-Jones 
is very similar to the results derived from the me- 
dian seismicity rate. However, for large target events 
and small time intervals, the method of Vere-Jones 
requires to generate a huge number of scenarios to 
obtain accurate estimates of the fraction of scenarios 
which have at least a target event. We thus believe 
that our method might be better in this case and give 
a more accurate estimate of the probability pi of hav- 
ing at least a target event. This is one possible origin 
of the discrepancy between Vere-Jones' results and 
ours. 

The ETAS parameters used in the two studies are 
also very similar and cannot account for the disagree- 
ment. The tests of Vere-Jones [1998] have been per- 
formed using a a/b ratio of 0.57/1.14 = 0.5 smaller 
than the value a/b = 0.8 used in our simulations. 
This difference may lead to a smaller predictability 
for the simulations of Vere-Jones [1998] because there 
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are fewer large aftershock sequences. The branching 
ratio n = 0.78 used by Vere- Jones [1998] is very close 
to our value n — 0.8. However, these difference in 
the ETAS parameters cannot explain why Vere- Jones 
[1998] obtains a better predictability for a Poisson 
process than for the ETAS model. Vere- Jones [1998] 
concludes that the Poisson process is better predic- 
tive than ETAS because the binomial score measured 
for time periods containing target events, i.e., by tak- 
ing only the first term of eq. (30), is larger for the 
Poisson process than for ETAS. While we agree with 
this point, we note that the fact that the first term 
of the binomial score is larger for the Poisson pro- 
cess than for ETAS does not imply that the Pois- 
son process is more predictive. Indeed, the score for 
periods containing events is maximum for a simple 
model specifying a probability pi = 1 of having an 
event for all time intervals. Because this simple model 
does not miss any event, it gives the maximum bino- 
mial score for periods containing target events, but 
is of course not better than the ETAS or the Pois- 
son model if we take into account the second term 
of the binomial score corresponding to periods with- 
out target events. If we take our second definition of 
the Poisson process, taking into account clustering, 
we obtain a better score with the ETAS model for 
periods containing target events than for the Poisson 
process. We thus think that the conclusion of Vere- 
Jone [1998] that the ETAS model is sometimes less 
predictive than the Poisson process is due to an inad- 
equate measure of the predictability. We thus caution 
that a suitable assessment of the forecasting skills of 
a model requires several complementary quantitative 
measures, such as the predicted versus realized seis- 
micity rates, the error diagram and predictability gain 
and the entropy-information gains, and a large num- 
ber of target events. 

6. Conclusions 

Using a simple model of triggered seismicity, the 
ETAS model, based on the (bare) Omori law, the 
Gutcnbcrg-Richter law and the idea that large events 
trigger more numerous aftershocks, we have devel- 
oped an analytical approach to account for the trig- 
gered seismicity adapted to the problem of forecast- 
ing future seismic rates at varying horizons from the 
present. Tests presented on synthetic catalogs have 
validated the use of interacting triggered seismicity 
to forecast large earthquakes in these models. This 
work provides what we believe is a useful benchmark 
from which to develop real prediction tests of real 



catalogs. These tests have also delineated the funda- 
mental limits underlying forecasting skills, stemming 
from an intrinsic stochastic component in the seis- 
micity models. Our results offer a rationale for the 
fact that pattern recognition algorithms may perform 
better for strong earthquakes than for weaker events. 
Although the predictability of an earthquake is in- 
dependent of its magnitude in the ETAS model, the 
prediction gain is better for the largest events because 
they are less numerous and it is thus more probable 
that they are associated with periods of large seismic- 
ity rates, which are themselves more predictable. 

We have shown in Helmstetter et al. [2003] that 
most precursory patterns used in prediction algo- 
rithms, such as a decrease of b- value or an increase 
of seismic activity can be reproduced by the ETAS 
model. If the physics of triggering is fully charac- 
terized by the class of models discussed here, this 
suggests that patterns and precursory indicators are 
sub-optimal compared with the prediction based on 
a full modeling of the seismicity. The calibration of 
the ETAS model or some of its variants on real cat- 
alogs as done in Kagan and Knopoff [1987], Kagan 
and Jackson [2000], Console and Murru [2001], Ogata 
[1988, 1989, 1992, 1999, 2001], Kagan [1991], Felzer et 
al. [2002] represent important steps in this direction. 
However, in practical terms, the issue of the model 
errors associated with the use of an incorrect model 
calibrated on an incomplete data set with not fully 
known parameters may make this statement weaker 
or even turn it on its head. 
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Figure 1. A realization of the ETAS model shows the seismicity rate (open circles) as a function of time after a 
large earthquake. This illustrates the differences between the realized seismicity rate X(t) (open circle), the average 
rcnormalized (or dressed) propagator K{t) (gray line), and the local propagator <f? m (t) (thin black line) . This 
aftershock sequence has been generated using the ETAS model with parameters n = 0.91, a — 0.5, b = 1, 9 = 0.2, 
mo = and c = 0.001 day, starting from a mainshock of magnitude M = 8 at time t = 0. The global aftershock rate 
is significantly higher than the direct (or first generation) aftershock rate, described by the local propagator $ m (i). 
The value of the branching ratio n = 0.915 implies that about 91.5% of aftershocks are triggered indirectly by the 
mainshock. The global aftershock rate N(t) decreases on average according to the dressed propagator K{t) ~ l/t 1 ^ 6 
for t < t* , which is significantly slower than the local propagator <f>(t) ~ l/t 1+e . These two asymptotic power laws 
are shown as dotted-dashed lines with their slopes in the log-log plot corresponding respectively to the exponents 
1 - 9 and 1 + 9. 
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Figure 2. Seismicity rate (hypermetropic renormalized propagator) following a large event that happened at time 
t — 0, removing the influence of aftershocks that were triggered in the past < U < u with u = 10 days. We 
have averaged over 1000 simulations starting at time u — 10 days after a large event of magnitude m — 6, using 
the parameters n = 1, c = 0.001 day, 9 = 0.2 and a = 0.5. There is a very good agreement between the realized 
hypermetropic seismicity rate (open circles) and K*(t) predicted by (19) and shown as the continuous line up to 
times t — u ~ 10 3 u. At large times, there is clear deviation between our numerical average of realized seismicity 
and the hypermetropic seismicity rate K* (t) due to a combination of a survival bias and large fluctuations in the 
numerics. 
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Figure 3. Rate of seismic activity for a synthetic catalog (black dots) generated with the parameters n = 0.8, 
a = 0.8, b = 1, c = 0.001 day, mo = 3 and /i = 1 event per day. We compare different methods of prediction of the 
seismicity rate following a large event M = 7 that has occurred at the time of the large peak shown in the figure. 
Using the data up to the present time fj < u, where u is the "present" taken fixed just after the M — 7 earthquake, 
we try to predict the future activity up to 1 year in the future. We use two predictions algorithms: the sum of the 
bare propagators of all past events U < u (crosses), and the median of the seismicity rate (open circles) obtained 
over 500 scenarios generated with the ETAS model, using the same parameters as for the synthetic catalog we want 
to predict, and taking into account the specific realization of the synthetic catalog up to the present. 
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Figure 4. Same as Figure 3 but as a function of the logarithm of the time after the mainshock. At early time 
t — u < 10~ 2 days, the predicted seismicity rate is correctly predicted by the naive bare Omori law shown by 
the crosses which is indistinguishable from the more elaborate scheme involving all cascades of triggered events. 
At larger times, the cascade of triggered seismicity renormalizes the seismicity rate to a significantly higher level, 
which is correctly forecasted by the mean over the 500 scenarios. 
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Figure 5. Comparison between the seismicity rate (black line) observed for 5 day time periods, with the predicted 
seismicity rate using either the sum of the bare propagators of the past seismicity (dots) or using the median of 100 
scenarios (gray line) generated with the same parameters as for the synthetic catalog we want to predict, n = 0.8, 
c = 0.001 day, fj, = 1 event per day, mo = 3, b = 1 and a = 0.8. The thin solid lines indicate the first and 9 th 
deciles of the set of 100 scenarios: there is 80% probability that the predicted seismicity over the next 5 days falls 
within these two lines. Stars indicate the occurrence of a large M > 7 earthquake. Forecasts are updated every 5 
days. 
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Figure 6. Comparison between the seismicity rate observed for 5 day time periods from the present with the 
predicted seismicity rate over the same 5 day periods using either the sum of the bare propagators of the past 
seismicity (crosses) or using the median of 100 scenarios (circles), corresponding to the same data shown in Figure 
5 but using a long synthetic catalog of N = 200000 events over a time period of 150 yrs. The dashed line corresponds 
to the perfect forecast when the predicted seismicity rate is equal to the realized seismicity rate. This figure shows 
that the best forecasts are obtained using the median of the scenarios rather than using the bare propagator, which 
always underestimates the realized seismicity rate. The meaning of the clusters LL, SL and SS are discussed in the 
text. Forecasts are updated every 5 days. A faster rate of updating does not change the fraction of predictable 
events lying close to the diagonal. 
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Figure 7. Same as Figure 5 but for a larger horizon t = 50 days. 
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Figure 8. Same as Figure 6 but for a larger horizon t = 50 days. For such large horizons, taking into account the 
cascade of triggered seismicity makes a large difference on the performance of the predicted seismicity rate. 
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Figure 9. Error diagram of different prediction algorithms, using either the bare propagator (dots), the median 
(circles) or the mean (triangles) number of events obtained for the scenarios. The synthetic catalog and the 
prediction methods are the same as for Figures 5 and 6. We use a time horizon (window size) of T — 5 days to 
estimate the seismicity rate but we update the predictions each 0.5 day. Target events are M > 6 earthquakes. An 
alarm is defined when the predicted seismicity rate is above a threshold. Each point of the curve corresponds to a 
different threshold ranging from 0.1 to 1000 events per day. The quality of the predictions is measured by plotting 
the ratio of failures to predict as a function of the total durations of the alarms normalized by the duration of the 
catalog. The results for these three algorithms are considerably better than those obtained for a random prediction, 
shown as a dashed line for reference. This Figure shows that about 20% of large peaks of seismic activity can be 
predicted with a very small alarm duration of about 1%. Panel (b) is a magnification of panel (a) close to the origin 
of the alarm duration showing the very fast increase of the success fraction (= f — failure to predict) as the alarm 
duration increases from 0. Panel (c) shows that the predicted gain, defined as the ratio of the success fraction over 
the alarm duration, decays as a function of the alarm duration from a very high value obtained for very short alarm 
durations as an approximate inverse power law with exponent slightly larger than 1/2. 
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Figure 10. Same as Figure 9 but for targets with lower magnitudes M > 5. Panel (c) shows that the predicted 
gain is again approximately an inverse power law with exponent close to 1/2 as a function of the alarm duration. 
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Figure 11. Same as Figure 9 but for targets with larger magnitudes M > 7. Panel (c) shows that the predicted 
gain is again approximately an inverse power law with exponent slightly smaller than 1 as a function of the alarm 
duration. Comparing this figure with Figures 9 and 10 suggests that the exponent defined in panel (c) is slowly 
increasing with the magnitude of the targets. 
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Figure 12. Analysis of the prediction methods, using the same synthetic catalog and predictions methods as for 
Figures 5-11. We use a time window T = 5 days to estimate the predicted seismicity rate, and a time dT = 0.5 
days between two updates of the prediction. Target events are M > 6 earthquakes. The duration of alarms 
normalized by the total duration of the catalog is shown in panel (a) as a function of the alarm threshold for the 
three predictions methods : bare propagator (dots), the median (circles) and the mean (triangles) number of events 
obtained for the scenarios. The proportion of successes is shown in panel (b). The prediction gain shown in panel 
(c) is defined by the ratio of the proportion of successes (b) over the duration of alarms (a). The prediction gain 
for large values of the alarm threshold is significantly higher that the random prediction gain equal to 1. 
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Figure 13. Loss function 7 defined as the sum of the fraction of missed events and of the fraction of alarms. 
We use the same catalog and prediction method as in Figures 5-12, with a time window of 5 days, and an update 
time of 0.5 days. Solid lines give the results for the prediction algorithm based on the ETAS model (average of the 
seismicity rate over 100 scenarios) for different values of the target magnitudes. Dashed lines correspond to the 
predictions made using a Poisson process, with a seismicity rate equal to the average seismicity rate of the catalog. 
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Figure 14. Probability pM(t) of having at least an event of magnitude larger than M in the time interval (t, t + T) 
with T=5 days. We have evaluated pu for M between 5 and 7, from the seismicity rate predicted by the median 
of 100 scenarios, and using equation (29) to estimate the probability pAi(t) from the average seismicity rate X(t) 
in the time interval (t, t + T). pAi(t) is plotted for the sequence shown in Figure 5. Stars indicate the occurrence 
of a large M > 7 earthquake. The largest M = 8.55 event of the sequence occurs at time t — 164 days when the 
seismicity rate is close to the average value. Thus, this event cannot be predicted. Six large M > 7 earthquakes 
occur in the following 500 days when the seismicity rate is still above its average value, including three M > 7 
events in the 5 days immediately following the great event. 



Table 1. Prediction gain for different choices of the alarm duration, and/or different values of the time 
interval T, of the update time dT, and of the target magnitude M t . N\ is the number of targets M > M t ; N2 
is the number of intervals with at least one target. G max is the maximum prediction gain, which is realized for 
an alarm duration A (in proportion of the total duration of the catalog), which is also given in the table. All 
three prediction algorithms used here provide the same gain as a function of the alarm duration, corresponding 
to different choices of the alarm threshold on the predicted seismicity rate. N s is the number of successful 
predictions, using the alarm threshold that provides the maximum predictions gain G max for an alarm duration 
A (we count only one success when two events occur in the same interval). This number N B is always very 
small, but a much larger number of successes can be obtained with a larger alarm duration. N\%, N w %, N 50 % 
are the number of successes corresponding to an alarm duration (in proportion of the total duration of the 
catalog) of 1%, 10% and 50% respectively, corresponding to the prediction gains Gi%, Gio% and G^% f . The 
values of G 50 % show a saturation in the predictive power when increasing the fraction of alarm time, reflecting 
the fundamental limitation stemming from the fraction of large earthquakes not associated with a large seismic 
rate. Reading for instance of the last line of this table, we observe that, out of 26 time windows of 50 days that 
contained a M > 7 earthquake, we are able to predict 7 of them with only 1% of the time occupied by alarms. 
Only two additional ones are predicted when using 10% of the time occupied by alarms. And only another 
four are predicted by increasing the time of alarms to half the total duration of the catalog. The catalog spans 
150 years corresponding to a little more than 10 5 half-day periods. 
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Table 2. Minimum value of the error function 7 for each 
set of parameters n, too, a and p. Another measure of the 
predictability is the prediction gain (Gi%) corresponding to 
an alarm duration of 1%, which is equal to the percentage of 
predicted events. Predictions are done for the next day and 
updated each day (T = dT = 1 days). 



n to a p 7 G 1% 



0.8 
0.8 
0.8 
0.5 
1.0 
0.8 
0.8 
0.8 
0.8 



3.0 
3.0 
3.0 
3.0 
3.0 
1.0 
2.0 
3.0 
3.0 



0.5 
0.8 
0.9 
0.8 
0.8 
0.8 
0.8 
0.8 
0.8 



1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.2 
1.1 
1.3 



0.86 
0.84 
0.81 
0.89 
0.64 
0.83 
0.82 
0.86 
0.76 



3.4 
8.9 

12.5 
4.8 

18.8 
6.5 
9.9 
9.3 

19.1 
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Table 3. Binomial scores B for several prediction algorithms and different choices of the time interval T and the 
target magnitude M t . We use non-overlapping time intervals for the predictions of length T with a time dT = T 
between two predictions. B mec i is evaluated from the median of the seismicity rate of the scenarios; B mean from 
the average seismicity rate; B meanl using the exponential A = cxp(log < n >) of the average of the logarithm 
of the seismicity rate; B^ is measured using the bare propagator to estimate the seismicity rate; B poisi using a 
Poisson process with a seismicity rate equal to the average value of the catalog; B poiS2 using a Poisson process with 
a probability defined as the fraction of intervals in the realized catalog that have at least a target event. N\ is the 
number of target events M > M t ; N2 is the number of intervals with at least one target event. Note that B me( i 
seems to be often the best for the smaller magnitudes while B mean is often the best for the largest magnitudes. 



T (days) 


M t 


JVi 


N 2 


B me d 








Bpoisi 


BpoiS2 


1. 


5.0 


2003 


1332 


-5997.7 


-5995.1 


-6341.0 


-6057.3 


-6361.8 


-6243.4 
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Table 3. (continued) 

T (days) M t Ni N 2 B med B meanl B mean B poisi B poiS2 



1. 


5.5 


637 


461 


-2512.4 


-2511.7 


-2614.6 


-2545.0 


-2678.7 


-2653.7 


1. 


6.0 


198 


159 


-1007.8 


-1006.9 


-1042.2 


-1023.2 


-1089.4 


-1085.0 


1. 


6.5 


66 


55 


-409.9 


-409.3 


-420.5 


-416.8 


-434.3 


-433.7 


1. 


7.0 


29 


27 


-217.8 


-216.6 


-218.1 


-224.0 


-233.3 


-232.1 


5. 


5.0 


2003 


1172 


-3626.3 


-3632.5 


-3851.7 


-3717.8 


-3862.8 


-3705.5 


5. 


5.5 


637 


420 


-1720.0 


-1719.1 


-1774.8 


-1765.4 


-1810.7 


-1774.3 


5. 


6.0 


198 


145 


-732.1 


-731.9 


-748.9 


-752.8 


-776.7 


-768.7 


5. 


6.5 


66 


53 


-321.2 


-320.5 


-322.0 


-331.3 


-335.4 


-334.5 


5. 


7.0 


29 


26 


-171.3 


-170.5 


-168.1 


- 179.3 


-183.5 


-182.7 


10. 


5.0 


2003 


1067 


-2651.0 


-2662.7 


-2822.4 


-2736.0 


-2852.4 


-2680.7 


10. 


5.5 


637 


400 


-1391.3 


-1393.0 


-1438.9 


-1439.9 


-1465.4 


-1424.7 


10. 


6.0 


198 


137 


-621.1 


-621.2 


-637.3 


-640.3 


-648.6 


-638.2 


10. 


6.5 


66 


50 


-276.6 


-276.2 


-280.1 


-286.1 


-285.2 


-283.7 


10. 


7.0 


29 


24 


-147.2 


-146.4 


-145.3 


-155.3 


-154.3 


-153.9 


50. 


5.0 


2003 


701 


-699.0 


-717.3 


-787.3 


-758.7 


-817.2 


-696.7 


50. 


5.5 


637 


329 


-658.9 


-666.0 


-698.8 


-702.7 


-706.2 


-662.8 


50. 


6.0 


198 


123 


-379.6 


-381.1 


-392.5 


-398.5 


-395.5 


-382.6 


50. 


6.5 


66 


48 


-192.2 


-191.5 


-192.5 


-204.3 


-197.9 


-196.2 


50. 


7.0 


29 


22 


-104.5 


-103.5 


-102.3 


-113.3 


-107.5 


-107.4 



